A Data Cleansing Method for Clustering Large-Scale Transaction Databases

نویسندگان

  • Woong-Kee Loh
  • Yang-Sae Moon
  • Jun-Gyu Kang
چکیده

In this paper, we emphasize the need for data cleansing when clustering large-scale transaction databases and propose a new data cleansing method that improves clustering quality and performance. We evaluate our data cleansing method through a series of experiments. As a result, the clustering quality and performance were significantly improved by up to 165% and 330%, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases

We propose a robust and efficient algorithm called ROCKET for clustering large-scale transaction databases. ROCKET is a divisive hierarchical algorithm that makes the most of recent hardware architecture. ROCKET handles the cases with the small and the large number of similar transaction pairs separately and efficiently. Through experiments, we show that ROCKET achieves high-quality clustering ...

متن کامل

An Effective Clustering Algorithm for Transaction Databases Based on K-Mean

Clustering is an important technique in machine learning, which has been successfully applied in many applications such as text and webpage classifications, but less in transaction database classification. A large organization usually has many branches and accumulates a huge amount of data in their branch databases called multidatabases. At present, the best way of mining multidatabases is, fir...

متن کامل

Conceptual Clustering of Heterogeneous Distributed Databases

With increasingly more databases becoming available on the Internet, there is a growing opportunity to globalise knowledge discovery and learn general patterns, rather than restricting learning to specific databases from which the rules may not be generalisable. Clustering of distributed databases facilitates learning of new concepts that characterise common features of, and differences between...

متن کامل

A hybrid approach for database intrusion detection at transaction and inter-transaction levels

Nowadays, information plays an important role in organizations. Sensitive information is often stored in databases. Traditional mechanisms such as encryption, access control, and authentication cannot provide a high level of confidence. Therefore, the existence of Intrusion Detection Systems in databases is necessary. In this paper, we propose an intrusion detection system for detecting attacks...

متن کامل

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEICE Transactions

دوره 93-D  شماره 

صفحات  -

تاریخ انتشار 2010